HMM-based synthesis of creaky voice
نویسندگان
چکیده
Creaky voice, also referred to as vocal fry, is a voice quality frequently produced in many languages, in both read and conversational speech. To enhance the naturalness of speech synthesis, these latter should be able to generate speech in all its expressive diversity, including creaky voice. The present study looks to exploit our recent developments, including creaky voice detection, prediction of creaky voice from context, and rendering of creaky excitation, into a fully functioning and automatic HMMbased synthesis system. HMM-based synthetic creaky voices are built and evaluated in subjective listening tests, which show that the best synthetic creaky voices are rated more natural and more creaky compared to a conventional voice. A non-creaky voice is also successfully transformed to use creak by modifying the F0 contour and excitation of the predicted creaky parts. The transformed voice is rated equal in terms of naturalness and clearly more creaky compared to the original voice.
منابع مشابه
Residual-Based Excitation with Continuous F0 Modeling in HMM-Based Speech Synthesis
In statistical parametric speech synthesis, creaky voice can cause disturbing artifacts. The reason is that standard pitch tracking algorithms tend to erroneously measure F0 in regions of creaky voice. This pattern is learned during training of hidden Markov-models (HMMs). In the synthesis phase, false voiced / unvoiced decision caused by creaky voice results in audible quality degradation. In ...
متن کاملAutomatic detection of voice creak
The analysis of large spontaneous speech corpora reveals that creaky mode appears more frequently than expected, especially for young female speakers. Creaky mode usually creates fundamental frequency measurement errors and creaky voice segments must be often identified manually beforehand to avoid erroneous reading of F0 in large speech databases. Various approaches have been proposed to ident...
متن کاملModeling the Creaky Excitation for Parametric Speech Synthesis
In order to produce natural sounding output, corpus-based speech synthesis systems need to be able to properly model the acoustic variability in the corpus. Creaky voice is a voice quality frequently produced in many languages, in both read and conversational speech settings. However, the creaky excitation displays different acoustic characteristics than modal excitations and is, hence, not sui...
متن کاملHmm-based Classification of Glottalization Phenomena in German-accented English
The present paper investigates the automatic detection of word-initial glottalization phenomena (glottal stops and creaky voice) in German-accented English by means of HMMs. Glottalization of word-initial vowels can be very frequent in German-accented English, as well as in German. Detection and classification of glottalization phenomena is useful in order to obtain a pre-segmentation of speech...
متن کاملData-driven detection and analysis of the patterns of creaky voice
This paper investigates the temporal excitation patterns of creaky voice. Creaky voice is a voice quality frequently used as a phrase-boundary marker, but also as a means of portraying attitude, affective states and even social status. Consequently, the automatic detection and modelling of creaky voice may have implications for speech technology applications. The acoustic characteristics of cre...
متن کامل